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Abstract 



A search is conducted for a target moving in discrete time 
between a finite number of cells according to a known Markov 
process. The set of cells available for search in a given time 
period is a function of the cell searched in the previous time 
period. The problem is formulated and solved as a partially 
observable Markov decision process (POMDP) . A finite time 
horizon POMDP solution technique is presented which is simpler 
than the standard linear programming methods. 



THE OPTIMAL SEARCH FOR A MOVING TARGET 



WHEN THE SEARCH PATH IS CONSTRAINED 
1. Problem Statement 

A discrete time search is conducted for a target moving 
between a finite set of cells C = {1,...,N} . At the beginning 
of each time period, one cell is searched. If cell i was 
searched in the previous time period, the current search cell 
must be selected from the set C^ c_ C . If the target is in the 
selected cell k , it is detected with probability q^^[0,l] . If 
the target is not in the cell searched, it can not be detected 
during the current time period. After an unsuccessful search, 
a target in cell i moves to cell j with probability p^^ 
for the next time period. The transition matrix, P = [ P-j_ j ] / 
is known to the searcher. The object of the search is to maxi- 
mize the T-time period probability of detection. 
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2. Background 

The moving target problem has received considerable atten- 
tion, much of it recent. Washburn [1980] and Stone and Kadane 
[1981] list the important references. Pollock [1970] solved 
the problem addressed here for N = 2 and = C 2 = C . 

Washburn [1980] and Brown [1980] introduced a powerful technique 
giving exact solutions for the N-cell case, if all cells are 
available for search in each time period (i.e., 

= C, i = 1, . . . ,N) , search effort can be infinitesimally divided 
between the cells, and the detection function is exponential. 

Stewart [1980] adapted this technique to the search problem 
considered here by using branch-and-bound methods. As Stewart 
observed, however, the nonconvexity of the space of possible 

search plans allows this method to converge to suboptimal solutions. 

Smallwood and Sondik [1973] and Monahan [1982] noted that 

the 2-cell problem solved by Pollock [1970] could be modelled 
as a partially observable Markov decision process (POMDP) and 
that an N-cell extension was possible. This paper makes that 
extension and, in addition, allows that the set of possible 
search cells in a given time period be a function of the search 
cell selected in the previous time period. This permits 
searches to be modelled where the searcher can travel only 
a limited distance between time periods. Thus, the search cell 
in a given time period must be within some specified neighbor- 
hood of the search cell in the previous time period. 

Also reported on is a finite time horizon POMDP solution 
technique which is simpler than the standard linear program- 
ming techniques (e.g., Monahan [1982]), and which, initial 
computational experience indicates, is more quickly executed. 
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3. Mathematical Development 

As is standard for many problems exploiting a Markov as- 
sumption, the solution technique used here is dynamic program- 
ming. This method requires that the process being modelled be 
defined in terms of a sufficient statistic (Bersekas [1976], 

p. 122). Following Sondik [1971], Smallwood and Sondik [1973] 

N+ 1 

and Platzman [1980], we use the row vector (Tr(k),i)€ R , where 
7Tj(k) = P r (the target is in cell j at the beginning of time 
period k , given unsuccessful search in all previous time 
periods], and i € C is the cell searched in the previous time 
period. If the dependence on k is clear from context, tt (k) 
will be written as tt . The state space then becomes n x C 
where 



and 1 and 0 can be either vectors or scalars. The vector 
inequality a >_ b means a^ b^ , Vi . 

Following the dynamic programming convention of labelling 
"backwards in time", we define V n (TT,i) to be the maximum 
obtainable probability of detection with n time periods remain- 
ing and a current state vector (ir,i). Let T^(tt) € be tt 

updated for unsuccessful search in cell j , using Bayes's rule. 
That is, 



n = {tt € R N | tt 1 = 1 , TT _> 0 } , 



T . (tt) = ( 1-q . tt . ) 1 ttP. , 
J J J 



( 1 ) 



where P . € R 



If q .tt . = 1 
J 



J 



R NXN is P with row j multiplied by (1-q^) . 

1 , then the search in the current time period detects 
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the target with certainty, and (1) is not defined. 



We can now 



write V (7T,i) in terms of v n _i^ 7r '^ as follows: 





( 2 ) 



with Vq ( it , i) = 0 . 

Equation (2) is the dynamic programming recursion that 
must be solved in each time period. It looks formidable, pri- 
marily because tt is real rather than discrete. We will 
show, however, that V (ir,i) may be expressed in a particularly 
simple form. Namely, 



where A(n,i) is a finite collection of N-vectors. The 
dynamic programming problem then becomes one of constructing 
A(n,i) from A(n-l,j). 

If = C , Vi , then the search problem as formulated 

becomes a standard POMDP and can be solved using the linear 
programming methods of Sondik [1971], Smallwood and Sondik [1973], 
or Monahan [1982]. Allowing that the action selected in the 
previous time period can constrain the actions available in the 
present time period requires an augmented state space 
(n x c vice II) and represents a generalization of the 
standard model. However, as the next theorem shows, the basic 
form of the POMDP solution remains the same. Specifically, V n (Tr,i) 
is piecewise linear and convex. 




max fra 
a€ A(n, i) 



(3) 
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Theorem: For n = 0,...,T , V n (ir,i) is piecewise linear and 

convex in tt . That is, 

V ( 7T , i ) = max tt a. , ( 

at A(n, i) 

where A(n,i) is a finite set of N-vectors. 

Proof: We proceed by induction. (4) holds trivially for n = 0 

and A ( 0 , i) = 0 . For n = 1 , it also holds, since from (2), 



with a 1 is the j th place and 0's elsewhere. 

Now assume (4) holds in time period (n-1) . From (2) , 

f ) 





max q . tt 



max 7ia 



at A( 1 , i) 



where A(l,i) = {q^CjljtCF} and 







a^t A(n-1, j ) 





l 



a^t A(n-1, j) 



max ira 
at A(n, i) 



(5) 
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r M ^ 

where A(n,i) = {a€R |a = S^q.. + P j a j ; ^ €C i and a j €A(n-l, j ) } . (6) 

So V n (ir,i) is of the proper form and the proof is complete. 

For any finite n and i€C , A(n,i) is a finite set. 
However, using (6) to generate A(n,i) and assuming (for 
illustration purposes only) that the number of elements in 
is M for all i€C , the number of vectors in A(n,i) is M 
times the number in A(n-l,i) . Since there are M vectors in 
A(l,i) , there are apparently M n vectors in A(n,i) . This 
equals the number of possible search paths for the n-time period 
problem that begin with cell i and suggests that total enu- 
meration of search paths might be as effective as this procedure. 

Fortunately, this is not necessarily the case. Following 
Smallwood and Sondik [1973], we note that some of the vectors in 

A(n,i) can be removed and the maximization of (5) left unchanged. 

/\ 

We say that a£A(n,i) is dominated if for every tt € II , 

max ra = max ira . (7) 

atA(n,i) atA(n,i) 
a t a 

Dominated vectors can be removed from A(n,i) and need not be 
used in the construction of A(n+l,j) . 

Sondik [1971] first provided a linear programming technique 
to identify dominated vectors for the POMDP . Following a slight 
modification in Monahan [1982], we solve the following linear 
program to check a€A(n,i) for dominance: 
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mm 
7T , x 



x 



(8) 



- TTa 

A 

s. t. x > na , Va€A(n,i) but a t a 

ren 

A /\ 

Whenever the minimal value of x - na is non-negative, a is 
dominated and can be removed from A(n) . The linear program- 
ming solution technique need not necessarily continue to opti- 
mality. As soon as the objective function becomes negative, a 
is determined to be not dominated. (This method is similar to 
the branch-and-bound technique of Stewart [1980] in that both 
are enumerative procedures to systematically eliminate search 
paths which can not be optimal.) 

Once the reduced vector sets A(n,i) have been generated 
for all i€C and n = (0,...,T) , the maximum probability of 
detection and the optimal T-time period search plan can be de- 
termined for any initial target distribution ir . Assume that 
before the search begins, the searcher is in cell i , and thus 
the initial search cell must be in . Then the maximum 

obtainable T-time period probability of detection is 

max Tia (9) 

a€ A (T , i) 

(If the searcher's starting cell, i , can be any element in C , 
(9) is maximized over all i€C to find the maximum probability 
of detection.) The cell searched in time period T is that 
used in (6) to construct the argmax of (9) . If cell j 
is searched in time period T and the target is not detected. 
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then (9) is resolved for time period T-l with T . (ir) replacing 
it and A(n-l,j) replacing A(n,i). 

Alternatively (and perhaps more simply) , one can note that 
each a€A(T,i) has associated with it, not just the cell searchec 

in time period T , but a series of T cells, built up by the 
sequential application of (6). When a particular a€A(T,i) 
maximizes (9) , the sequence of cells associated with the vector 
a is the optimal search path. 
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4. The Dual Definition of Dominance and a Geometric Interpretation 
The linear programming dual of (8) is 

max v (10) 

k 

s.t. Y A . a . - v > a 

i = l 1 1 

A 1 = 1 

A > 0 

/\ 

where i = (l,...,k) indexes all vectors in A(n,i) except a . 
The duality theorem of linear programming (Dantzig [1963], 
p. 125 or Luenburger [1973], p. 72) states that the primal has 
a finite optimal solution iff the dual has a finite optimal 
solution; and when feasible optimal solutions exist, the two 
optimal objective functions are equal. 

/X 

We know that a€A(n,i) is dominated when the minimal 
value of the objective function of (8) is non-negative. In this 
case, the duality theorem requires that (10) is feasible and 
that the optimal value of v is also non-negative. Thus, from 
the constraints of (10), there exists a linear combination of 
elements in A(n,i) except a which (in a vector sense) is 

/N 

greater than or equal to a . And the strength of the duality 
theorem allows the implication to hold in the other direction as 

well. That is, if such a linear combination of vectors in 

✓\ 

A(n,i) exists, then a is dominated. 

The dual characterization of dominance allows a simple 
geometric interpretation. If B is the convex hull of .. Ir 
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is dominated iff 



vectors in A(n,i) except a 

✓v 

3 b 6 B such that b > a . 



, then a 
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5. Alternative Solution Techniques 

The POMDP solution procedure described above requires 
extensive calcual tions . To reduce A(n,i) to its minimal size, 
each a€A(n,i) must be checked for dominance by solving a 
potentially large linear program. The question naturally arises 
as to whether a simpler or more quickly executed procedure could 
be found, even if such a procedure did not necessarily reduce 
A(n,i) to its minimal size. 

What is possibly the simplest such reduction scheme is to 
compare each a. and a, (a. t a, ) in A(n,i) , and to dis- 

3 K j K 

card a. if a . < a, or a. if a, < a . . The vectors re- 

3 3 ~ K K 

maining can then be further reduced using linear programming 
methods, or the larger- than-minimal A(n,i) can be used directly 
to construct A(n+1, j) by (6) . Both of these procedures were 

coded for the IBM 3033 at the Naval Postgraduate School, and, 
for the search problems examined, the latter method, using no 
linear programming at all, generated optimal solutions more 
quickly and required less computer storage. Both methods appeared 
preferable to using only linear programming methods to check 
for dominance. 
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6. An Example Problem 

A simple 5-cell search problem is described by the following 
parameters . 



P = 



1 


0 


0 


0 


0 


0 


. 75 


. 25 


0 


0 


0 


. 25 


.5 


.25 


0 


0 


0 


.25 


. 5 


. 25 


.0 


0 


0 


. 25 


. 75 



C = {1,2,3,4,51 

C = 2 
1 Z 

C 2 = {2,3} 

C 3 = {2,3,4} 

C 4 = {3,4,5} 

C 5 = {4 ' 5} 

q . = q, Vi 



searcher's starting cell: 1 

T = 7 

TT (7) = (0,0, 0,0,1) 
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The target starts in cell 5 and the searcher in cell 1. Since 
C^ = 2 , the initial cell searched is 2 . After the initial 
search, cell 1 is inaccessible to both the searcher and the 
target. 

The optimal search path and the maximum obtainable prob- 
ability of detection (P^) are given in Table 1 for q of 
.2, .4, .6, .8, and 1 . Using the simplest reduction method 

(i.e., no linear programming), the number of vectors in A(7,l) 
increased from 3 for q = 1 to 187 for q = .2 . The CPU time 
required to obtain the optimal solution increased from 24 seconds 
for q = 1 to 536 seconds for q = .2 . 



q 


optimal 


search 


path 


: p d 


# vectors 
in A ( 7 , 1) 


CPU 

time 

(sec) 


. 2 


2 


3 


4 


5 


5 


5 


5 


: . 357 


187 


536 


. 4 


2 


3 


4 


5 


5 


4 


5 


. 594 


89 


280 


. 6 


2 


3 


4 


5 


4 


5 


4 


.757 


49 


179 


.8 


! 2 


3 


4 


5 


4 


5 


4 


. 867 


26 


169 


1.0 


2 


3 


4 


5 


4 


3 


2 


.934 


3 


24 



Table 1. Example Problem Results 
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